Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Pattern Recognit ; 143: 109732, 2023 Nov.
Article in English | MEDLINE | ID: covidwho-20231102

ABSTRACT

Intelligent diagnosis has been widely studied in diagnosing novel corona virus disease (COVID-19). Existing deep models typically do not make full use of the global features such as large areas of ground glass opacities, and the local features such as local bronchiolectasis from the COVID-19 chest CT images, leading to unsatisfying recognition accuracy. To address this challenge, this paper proposes a novel method to diagnose COVID-19 using momentum contrast and knowledge distillation, termed MCT-KD. Our method takes advantage of Vision Transformer to design a momentum contrastive learning task to effectively extract global features from COVID-19 chest CT images. Moreover, in transfer and fine-tuning process, we integrate the locality of convolution into Vision Transformer via special knowledge distillation. These strategies enable the final Vision Transformer simultaneously focuses on global and local features from COVID-19 chest CT images. In addition, momentum contrastive learning is self-supervised learning, solving the problem that Vision Transformer is challenging to train on small datasets. Extensive experiments confirm the effectiveness of the proposed MCT-KD. In particular, our MCT-KD is able to achieve 87.43% and 96.94% accuracy on two publicly available datasets, respectively.

2.
Comput Med Imaging Graph ; 108: 102258, 2023 Jun 03.
Article in English | MEDLINE | ID: covidwho-20230632

ABSTRACT

Lung cancer has the highest mortality rate. Its diagnosis and treatment analysis depends upon the accurate segmentation of the tumor. It becomes tedious if done manually as radiologists are overburdened with numerous medical imaging tests due to the increase in cancer patients and the COVID pandemic. Automatic segmentation techniques play an essential role in assisting medical experts. The segmentation approaches based on convolutional neural networks have provided state-of-the-art performances. However, they cannot capture long-range relations due to the region-based convolutional operator. Vision Transformers can resolve this issue by capturing global multi-contextual features. To explore this advantageous feature of the vision transformer, we propose an approach for lung tumor segmentation using an amalgamation of the vision transformer and convolutional neural network. We design the network as an encoder-decoder structure with convolution blocks deployed in the initial layers of the encoder to capture the features carrying essential information and the corresponding blocks in the final layers of the decoder. The deeper layers utilize the transformer blocks with a self-attention mechanism to capture more detailed global feature maps. We use a recently proposed unified loss function that combines cross-entropy and dice-based losses for network optimization. We trained our network on a publicly available NSCLC-Radiomics dataset and tested its generalizability on our dataset collected from a local hospital. We could achieve average dice coefficients of 0.7468 and 0.6847 and Hausdorff distances of 15.336 and 17.435 on public and local test data, respectively.

3.
28th International Computer Conference, Computer Society of Iran, CSICC 2023 ; 2023.
Article in English | Scopus | ID: covidwho-2323458

ABSTRACT

Choosing a proper outfit is one of the problems we deal with every day. Today, people tend to use online websites for shopping, and the COVID-19 situation forced this condition more than before. In this research, we proposed a new architecture for multi-fashion item retrieval from a website database. We deployed a CLIP transformer model instead of convolutional neural networks in a triplet network. We also added a long short-term memory network (LSTM) to automatically extract and code the image features to generate descriptive text for each input image. Our OutCLIP model succeeded in doing its task with 83% precision and 85% recall accuracy in multi-item retrieval. This model can be trained and used in fashion retrieval problems and improve the former proposed models. Considering the descriptive text and the image together gives the model a better understanding of the concept and improves its generalization. © 2023 IEEE.

4.
Mathematics ; 11(6), 2023.
Article in English | Scopus | ID: covidwho-2300650

ABSTRACT

Early illness detection enables medical professionals to deliver the best care and increases the likelihood of a full recovery. In this work, we show that computer-aided design (CAD) systems are capable of using chest X-ray (CXR) medical imaging modalities for the identification of respiratory system disorders. At present, the COVID-19 pandemic is the most well-known illness. We propose a system based on explainable artificial intelligence to detect COVID-19 from CXR images by using several cutting-edge convolutional neural network (CNN) models, as well as the Vision of Transformer (ViT) models. The proposed system also visualizes the infected areas of the CXR images. This gives doctors and other medical professionals a second option for supporting their decision. The proposed system uses some preprocessing of the images, which includes the segmentation of the region of interest using a UNet model and rotation augmentation. CNN employs pixel arrays, while ViT divides the image into visual tokens;therefore, one of the objectives is to compare their performance in COVID-19 detection. In the experiments, a publicly available dataset (COVID-QU-Ex) is used. The experimental results show that the performances of the CNN-based models and the ViT-based models are comparable. The best accuracy was 99.82%, obtained by the EfficientNetB7 (CNN-based) model, followed by the SegFormer (ViT-based). In addition, the segmentation and augmentation enhanced the performance. © 2023 by the authors.

5.
3rd International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics, ICA-SYMP 2023 ; : 127-130, 2023.
Article in English | Scopus | ID: covidwho-2275520

ABSTRACT

One of the difficult challenges in AI development is to make machine understand the human feeling through expression because human can express feeling in various ways, for example, through voices, facial actions or behaviors. Facial Emotion Recognition (FER) has been used in interrogating suspects and being a tool to help detect emotions in people with nerve damage or even in the COVID-19 pandemic when patients hide their timelines. It can be applied to detect lies through micro expression. In this work will mainly focus on FER. The results of Deep Neural Network (DNN), Convolutional Neural Network (CNN), and Vision Transformer were compared. Human emotion expressions were classified by using facial expression datasets from AffectNet, Tsinghua, Extended Cohn Kanade (CK+), Karolinska Directed Emotional Faces (KDEF) and Real-world Affective Faces (RAF). Finally, all models were evaluated on the testing dataset to confirm their performance. The result shows that Vision Transformer model outperforms other models. © 2023 IEEE.

6.
Evolving Systems ; 2023.
Article in English | Scopus | ID: covidwho-2269831

ABSTRACT

The lungs of patients with COVID-19 exhibit distinctive lesion features in chest CT images. Fast and accurate segmentation of lesion sites from CT images of patients' lungs is significant for the diagnosis and monitoring of COVID-19 patients. To this end, we propose a progressive dense residual fusion network named PDRF-Net for COVID-19 lung CT segmentation. Dense skip connections are introduced to capture multi-level contextual information and compensate for the feature loss problem in network delivery. The efficient aggregated residual module is designed for the encoding-decoding structure, which combines a visual transformer and the residual block to enable the network to extract richer and minute-detail features from CT images. Furthermore, we introduce a bilateral channel pixel weighted module to progressively fuse the feature maps obtained from multiple branches. The proposed PDRF-Net obtains good segmentation results on two COVID-19 datasets. Its segmentation performance is superior to baseline by 11.6% and 11.1%, and outperforming other comparative mainstream methods. Thus, PDRF-Net serves as an easy-to-train, high-performance deep learning model that can realize effective segmentation of the COVID-19 lung CT images. © 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

7.
2022 IEEE International Conference on Trends in Quantum Computing and Emerging Business Technologies, TQCEBT 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2261667

ABSTRACT

Early detection of pneumonia in patients through effective medical imaging may enable timely remedial measures and reduce the severity of the infection. There is an increase in cases among new-borns, teenagers and also people with health issues in recent years. The COVID-19 pandemic also revealed the major impact pneumonia had on the lungs and the consequences of delayed detection. The presence of the infection in the lungs is examined through images of Chest X-ray, however, for an early diagnosis of the infection, this paper proposes an automated model as a more effective alternative. Convolutional Vision Transformer (CVT) which gives an accuracy of 97.13%, and is a robust combination of Convolution and Vision Transformer (ViT), is suggested in this paper as a potential model to detect pneumonia early in patients. © 2022 IEEE.

8.
2022 International Conference on Frontiers of Information Technology, FIT 2022 ; : 82-87, 2022.
Article in English | Scopus | ID: covidwho-2287687

ABSTRACT

In the current pandemic, precise and early diagnose of COVID-19 patient remained a crucial task for control of the spread of the COVID-19 virus in the healthcare sector. Due to the unexpected spike in COVID-19 cases, the majority of countries have experienced scarcity and poor testing rate. Chest X-rays and CT scans have been discussed in the literature as a viable source of testing for COVID-19 disease in patients. However, manually reviewing the CT and x-ray images is time-consuming and prone to error. Taking account into these constraints and the improvements in data science, this research proposed a Vision Transformer-based deep learning pipeline for COVID-19 diagnose from CT-based imaging. Due to the scarcity of large data sets, three open-source datasets of CT scans are pooled to generate 27370 images of covid and non- covid individuals. The proposed vision transformer-based model accurately diagnoses COVID-19 from normal chest CT images with an accuracy of 98 percent. This research would assist the practitioner, radiologist and doctors in early and accurate diagnose of COVID-19. © 2022 IEEE.

9.
Pattern Recognit Lett ; 164: 173-182, 2022 Dec.
Article in English | MEDLINE | ID: covidwho-2246515

ABSTRACT

As wearing face masks is becoming an embedded practice due to the COVID-19 pandemic, facial expression recognition (FER) that takes face masks into account is now a problem that needs to be solved. In this paper, we propose a face parsing and vision Transformer-based method to improve the accuracy of face-mask-aware FER. First, in order to improve the precision of distinguishing the unobstructed facial region as well as those parts of the face covered by a mask, we re-train a face-mask-aware face parsing model, based on the existing face parsing dataset automatically relabeled with a face mask and pixel label. Second, we propose a vision Transformer with a cross attention mechanism-based FER classifier, capable of taking both occluded and non-occluded facial regions into account and reweigh these two parts automatically to get the best facial expression recognition performance. The proposed method outperforms existing state-of-the-art face-mask-aware FER methods, as well as other occlusion-aware FER methods, on two datasets that contain three kinds of emotions (M-LFW-FER and M-KDDI-FER datasets) and two datasets that contain seven kinds of emotions (M-FER-2013 and M-CK+ datasets).

10.
Intelligent Systems with Applications ; 17, 2023.
Article in English | Scopus | ID: covidwho-2231351

ABSTRACT

The COVID-19 pandemic has disrupted various levels of society. The use of masks is essential in preventing the spread of COVID-19 by identifying an image of a person using a mask. Although only 23.1% of people use masks correctly, Artificial Neural Networks (ANN) can help classify the use of good masks to help slow the spread of the Covid-19 virus. However, it requires a large dataset to train an ANN that can classify the use of masks correctly. MaskedFace-Net is a suitable dataset consisting of 137016 digital images with 4 class labels, namely Mask, Mask Chin, Mask Mouth Chin, and Mask Nose Mouth. Mask classification training utilizes Vision Transformers (ViT) architecture with transfer learning method using pre-trained weights on ImageNet-21k, with random augmentation. In addition, the hyper-parameters of training of 20 epochs, an Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.03, a batch size of 64, a Gaussian Cumulative Distribution (GeLU) activation function, and a Cross-Entropy loss function are used to be applied on the training of three architectures of ViT, namely Base-16, Large-16, and Huge-14. Furthermore, comparisons of with and without augmentation and transfer learning are conducted. This study found that the best classification is transfer learning and augmentation using ViT Huge-14. Using this method on MaskedFace-Net dataset, the research reaches an accuracy of 0.9601 on training data, 0.9412 on validation data, and 0.9534 on test data. This research shows that training the ViT model with data augmentation and transfer learning improves classification of the mask usage, even better than convolutional-based Residual Network (ResNet). © 2023 The Author(s)

11.
Med Biol Eng Comput ; 61(6): 1395-1408, 2023 Jun.
Article in English | MEDLINE | ID: covidwho-2220196

ABSTRACT

A long-standing challenge in pneumonia diagnosis is recognizing the pathological lung texture, especially the ground-glass appearance pathological texture. One main difficulty lies in precisely extracting and recognizing the pathological features. The patients, especially those with mild symptoms, show very little difference in lung texture, neither conventional computer vision methods nor convolutional neural networks perform well on pneumonia diagnosis based on chest X-ray (CXR) images. In the meanwhile, the Coronavirus Disease 2019 (COVID-19) pandemic continues wreaking havoc around the world, where quick and accurate diagnosis backed by CXR images is in high demand. Rather than simply recognizing the patterns, extracting feature maps from the original CXR image is what we need in the classification process. Thus, we propose a Vision Transformer (VIT)-based model called PneuNet to make an accurate diagnosis backed by channel-based attention through X-ray images of the lung, where multi-head attention is applied on channel patches rather than feature patches. The techniques presented in this paper are oriented toward the medical application of deep neural networks and VIT. Extensive experiment results show that our method can reach 94.96% accuracy in the three-categories classification problem on the test set, which outperforms previous deep learning models.


Subject(s)
COVID-19 , Deep Learning , Pneumonia , Humans , COVID-19/diagnostic imaging , X-Rays , SARS-CoV-2 , Algorithms , Pneumonia/diagnostic imaging , COVID-19 Testing
12.
2022 IEEE-EMBS International Conference on Biomedical and Health Informatics, BHI 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2161373

ABSTRACT

The fast proliferation of the coronavirus around the globe has put several countries' healthcare systems in danger of collapsing. As a result, locating and separating COVID-19-positive patients is a critical task. Deep Learning approaches were used in several computer-aided automated systems that utilized chest computed tomography (CT-scan) or X-ray images to create diagnostic tools. However, current Convolutional Neural Network (CNN) based approaches cannot capture the global context because of inherent image-specific inductive bias. These techniques also require large and labeled datasets to train the algorithm, but not many labeled COVID-19 datasets exist publicly. To mitigate the problem, we have developed a self-attention-based Vision Transformer (ViT) architecture using CT-scan. The proposed ViT model achieves an accuracy of 98.39% on the popular SARS-CoV-2 datasets, outperforming the existing state-of-the-art CNN-based models by 1%. We also provide the characteristics of CT scan images of the COVID-19-affected patients and an error analysis of the model's outcome. Our findings show that the proposed ViT-based model can be an alternative option for medical professionals for effective COVID-19 screening. The implementation details of the proposed model can be accessed at https://github.com/Pranabiitp/ViT. © 2022 IEEE.

13.
3rd International Conference on Next Generation Computing Applications, NextComp 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2136450

ABSTRACT

This paper presents an explainable deep learning network to classify COVID from non-COVID based on 3D CT lung images. It applies a subset of the data for MIA-COV19 challenge through the development of 3D form of Vision Transformer deep learning architecture. The data comprise 1924 subjects with 851 being diagnosed with COVID, among them 1,552 being selected for training and 372 for testing. While most of the data volume are in axial view, there are a number of subjects' data are in coronal or sagittal views with 1 or 2 slices are in axial view. Hence, while 3D data based classification is investigated, in this competition, 2D axial-view images remains the main focus. Two deep learning methods are studied, which are vision transformer (ViT) based on attention models and DenseNet that is built upon conventional convolutional neural network (CNN). Initial evaluation results indicates that ViT performs better than DenseNet with F1 scores being 0.81 and 0.72 respectively. (Codes are available at GitHub at https://github.com/xiaohong1/COVID-ViT). This paper illustrates that vision transformer performs the best in comparison to the other current state of the art approaches in classification of COVID from CT lung images. © 2022 IEEE.

14.
4th IEEE International Conference on Artificial Intelligence in Engineering and Technology, IICAIET 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2136364

ABSTRACT

The fast proliferation of the coronavirus disease 2019 (COVID19) has pushed many countries' healthcare systems to the brink of disaster. It has become a necessity to automate the screening procedures to reduce the ongoing cost to the healthcare systems. Although the use of the Convolutional Neural Networks (CNNs) is gaining attention in the field of COVID19 diagnosis based on medical images, these models have disadvantages due to their image-specific inductive bias, which contradict to the Vision Transformer (ViT). This paper conducts comparative study of the use of the three most established CNN models and a ViT to deal with the classification of COVID19 and Non-COVID19 cases. This study uses 2481 computed tomography (CT) images of 1252 COVID19 and 1229 Non-COVID19 patients. Confusion metrics and performance metrics were used to analyze the models. The experimental results show all the pre-trained CNNs (VGG16, ResNet50, and IncetionV3)outperformed the pre-trained ViT model, with InceptionV3 as the best performing model (99.20% of accuracy). © 2022 IEEE.

15.
Front Microbiol ; 13: 1024104, 2022.
Article in English | MEDLINE | ID: covidwho-2142119

ABSTRACT

Since the outbreak of COVID-19, hundreds of millions of people have been infected, causing millions of deaths, and resulting in a heavy impact on the daily life of countless people. Accurately identifying patients and taking timely isolation measures are necessary ways to stop the spread of COVID-19. Besides the nucleic acid test, lung CT image detection is also a path to quickly identify COVID-19 patients. In this context, deep learning technology can help radiologists identify COVID-19 patients from CT images rapidly. In this paper, we propose a deep learning ensemble framework called VitCNX which combines Vision Transformer and ConvNeXt for COVID-19 CT image identification. We compared our proposed model VitCNX with EfficientNetV2, DenseNet, ResNet-50, and Swin-Transformer which are state-of-the-art deep learning models in the field of image classification, and two individual models which we used for the ensemble (Vision Transformer and ConvNeXt) in binary and three-classification experiments. In the binary classification experiment, VitCNX achieves the best recall of 0.9907, accuracy of 0.9821, F1-score of 0.9855, AUC of 0.9985, and AUPR of 0.9991, which outperforms the other six models. Equally, in the three-classification experiment, VitCNX computes the best precision of 0.9668, an accuracy of 0.9696, and an F1-score of 0.9631, further demonstrating its excellent image classification capability. We hope our proposed VitCNX model could contribute to the recognition of COVID-19 patients.

16.
2022 Asia Conference on Algorithms, Computing and Machine Learning, CACML 2022 ; : 505-511, 2022.
Article in English | Scopus | ID: covidwho-2051936

ABSTRACT

Masked face recognition, a non-contact biometric technology, has attracted much attention and developed rapidly during the coronavirus disease 2019 (COVID-19) outbreak. The existing work trains the masked face recognition model based on a large number of 2D masked face images. However, in practical application scenarios, it is difficult to obtain a large number of masked face images in a short period of time. Therefore, combined with 3D face recognition technology, this paper proposes a masked face recognition model trained with non-masked face images. In this paper, we locate and segment the complete face region and the face region not occluded by masks from the face point clouds. The geometric features of the 3D face surface, namely depth, azimuth, and elevation, are extracted from the above two regions to generate training data. The proposed masked face recognition model based on vision Transformer divides the complete faces and part of the faces into sequence images, and then captures the relationship between the image slices to compensate for the impact caused by the lack of face information, thereby improving the recognition performance. Comparative experiments with the state-of-the-art masked face recognition work are carried out on four databases. The experimental results show that the recognition accuracy of the proposed model is improved by 9.86% on Bosphorus database, 16.77% on CASIA-3D FaceV1 database, 2.32% on StirlingESRC database, and 34.81% on Ajmal main database, respectively, which verifies the effectiveness and stability of the proposed model. © 2022 IEEE.

17.
Comput Methods Programs Biomed ; 226: 107141, 2022 Nov.
Article in English | MEDLINE | ID: covidwho-2031211

ABSTRACT

BACKGROUND AND OBJECTIVE: Chest X-ray imaging is a relatively cheap and accessible diagnostic tool that can assist in the diagnosis of various conditions, including pneumonia, tuberculosis, COVID-19, and others. However, the requirement for expert radiologists to view and interpret chest X-ray images can be a bottleneck, especially in remote and deprived areas. Recent advances in machine learning have made possible the automated diagnosis of chest X-ray scans. In this work, we examine the use of a novel Transformer-based deep learning model for the task of chest X-ray image classification. METHODS: We first examine the performance of the Vision Transformer (ViT) state-of-the-art image classification machine learning model for the task of chest X-ray image classification, and then propose and evaluate the Input Enhanced Vision Transformer (IEViT), a novel enhanced Vision Transformer model that can achieve improved performance on chest X-ray images associated with various pathologies. RESULTS: Experiments on four chest X-ray image data sets containing various pathologies (tuberculosis, pneumonia, COVID-19) demonstrated that the proposed IEViT model outperformed ViT for all the data sets and variants examined, achieving an F1-score between 96.39% and 100%, and an improvement over ViT of up to +5.82% in terms of F1-score across the four examined data sets. IEViT's maximum sensitivity (recall) ranged between 93.50% and 100% across the four data sets, with an improvement over ViT of up to +3%, whereas IEViT's maximum precision ranged between 97.96% and 100% across the four data sets, with an improvement over ViT of up to +6.41%. CONCLUSIONS: Results showed that the proposed IEViT model outperformed all ViT's variants for all the examined chest X-ray image data sets, demonstrating its superiority and generalisation ability. Given the relatively low cost and the widespread accessibility of chest X-ray imaging, the use of the proposed IEViT model can potentially offer a powerful, but relatively cheap and accessible method for assisting diagnosis using chest X-ray images.


Subject(s)
X-Rays , Humans , COVID-19/diagnostic imaging , Deep Learning , Pneumonia/diagnostic imaging , SARS-CoV-2
18.
2022 IEEE International Conference on Communications, ICC 2022 ; 2022-May:613-618, 2022.
Article in English | Scopus | ID: covidwho-2029235

ABSTRACT

As a consequence of the COVID-19 pandemic, the demand for telecommunication for remote learning/working and telemedicine has significantly increased. Mobile Edge Caching (MEC) in the 6G networks has been evolved as an efficient solution to meet the phenomenal growth of the global mobile data traffic by bringing multimedia content closer to the users. Although massive connectivity enabled by MEC networks will significantly increase the quality of communications, there are several key challenges ahead. The limited storage of edge nodes, the large size of multimedia content, and the time-variant users' preferences make it critical to efficiently and dynamically predict the popularity of content to store the most upcoming requested ones before being requested. Recent advancements in Deep Neural Networks (DNNs) have drawn much research attention to predict the content popularity in proactive caching schemes. Existing DNN models in this context, however, suffer from long-term dependencies, computational complexity, and unsuitability for parallel computing. To tackle these challenges, we propose an edge caching framework incorporated with the attention-based Vision Transformer (ViT) neural network, referred to as the Transformer-based Edge (TEDGE) caching, which to the best of our knowledge, is being studied for the first time. Moreover, the TEDGE caching framework requires no data pre-processing and additional contextual information. Simulation results corroborate the effectiveness of the proposed TEDGE caching framework in comparison to its counterparts. © 2022 IEEE.

19.
Biocybern Biomed Eng ; 42(3): 1066-1080, 2022.
Article in English | MEDLINE | ID: covidwho-2007461

ABSTRACT

The polymerase chain reaction (PCR) test is not only time-intensive but also a contact method that puts healthcare personnel at risk. Thus, contactless and fast detection tests are more valuable. Cough sound is an important indicator of COVID-19, and in this paper, a novel explainable scheme is developed for cough sound-based COVID-19 detection. In the presented work, the cough sound is initially segmented into overlapping parts, and each segment is labeled as the input audio, which may contain other sounds. The deep Yet Another Mobile Network (YAMNet) model is considered in this work. After labeling, the segments labeled as cough are cropped and concatenated to reconstruct the pure cough sounds. Then, four fractal dimensions (FD) calculation methods are employed to acquire the FD coefficients on the cough sound with an overlapped sliding window that forms a matrix. The constructed matrixes are then used to form the fractal dimension images. Finally, a pretrained vision transformer (ViT) model is used to classify the constructed images into COVID-19, healthy and symptomatic classes. In this work, we demonstrate the performance of the ViT on cough sound-based COVID-19, and a visual explainability of the inner workings of the ViT model is shown. Three publically available cough sound datasets, namely COUGHVID, VIRUFY, and COSWARA, are used in this study. We have obtained 98.45%, 98.15%, and 97.59% accuracy for COUGHVID, VIRUFY, and COSWARA datasets, respectively. Our developed model obtained the highest performance compared to the state-of-the-art methods and is ready to be tested in real-world applications.

20.
6th International Conference on Computer Vision and Image Processing, CVIP 2021 ; 1567 CCIS:501-511, 2022.
Article in English | Scopus | ID: covidwho-1971573

ABSTRACT

With the COVID-19 pandemic outbreak, most countries have limited their grain exports, which has resulted in acute food shortages and price escalation in many countries. An increase in agriculture production is important to control price escalation and reduce the number of people suffering from acute hunger. But crop loss due to pests and plant diseases has also been rising worldwide, inspite of various smart agriculture solutions to control the damage. Out of several approaches, computer vision-based food security systems have shown promising performance, and some pilot projects have also been successfully implemented to issue advisories to farmers based on image-based farm condition monitoring. Several image processing, machine learning, and deep learning techniques have been proposed by researchers for automatic disease detection and identification. Although recent deep learning solutions are quite promising, most of them are either inspired by ILSVRC architectures with high memory and computational requirements, or light convolutional neural network (CNN) based models that have a limited degree of generalization. Thus, building a lightweight and compact CNN based model is a challenging task. In this paper, a transformer-based automatic disease detection model “PlantViT" has been proposed, which is a hybrid model of a CNN and a Vision Transformer. The aim is to identify plant diseases from images of leaves by developing a Vision Transformer-based deep learning technique. The model takes the capabilities of CNNs and the Vision Transformer. The Vision Transformer is based on a multi-head attention module. The experiment has been evaluated on two large-scale open-source plant disease detection datasets: PlantVillage and Embrapa. Experimental results show that the proposed model can achieve 98.61% and 87.87% accuracy on the PlantVillage and Embrapa datasets, respectively. The PlantViT can obtain significant improvement over the current state-of-the-art methods in plant disease detection. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

SELECTION OF CITATIONS
SEARCH DETAIL